Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(topn): reduce unnecessary table scan in streaming (Group)TopN executors #13832

Merged
merged 16 commits into from
Dec 13, 2023

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Dec 6, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

According to #13797 and some further investigation, when using a temporal filter with short time period, it's very likely that the high part of TopN cache frequently being deleted to empty. And because TopN executors (actually the managed TopN state) just blindly scan the state table to fill high cache, the performance can be significantly impacted.

This PR maintains a row_count in the managed TopN state, so that it's possible to avoid table scan when cache is empty if the managed state knowns the cache is still synced with the table (row counts match).

Fixes #13797.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

cache.insert(topn_row.cache_key, (&topn_row.row).into());
if cache.len() == cache_size_limit {
table_row_count = None; // cache becomes full, we cannot get precise table row count this time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shall check this euqal before insert, so we can know that the number of record in hummock is exactly larger than cache. As following:

if cache.len() == cache_size_limit {
    table_row_count = None; 
    break;
}
cache.insert(topn_row.cache_key, (&topn_row.row).into());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a minute, I'm still working on this. Not ready😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get it. When fill_high_cache being called, the high cache will never be full, so this part of code is just to insert into the cache until it reaches the size limit.

}

pub fn delete(&mut self, value: impl Row) {
self.state_table.delete(value);
if let Some(row_count) = self.row_count.as_mut() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure that the delete key must exist in hummock ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe now we have state table delete sanity check right?

@stdrc stdrc marked this pull request as ready for review December 7, 2023 07:42
@@ -325,19 +351,20 @@ impl TopNCacheTrait for TopNCache<false> {
&& (self.offset == 0 || cache_key > *self.low.last_key_value().unwrap().0)
{
// The row is in mid
self.middle.remove(&cache_key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to move this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to keep the entries in cache consistent with table

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea sounds OK to me. May test whether it works.

@xxchan
Copy link
Member

xxchan commented Dec 8, 2023

Did you tested the perf issue is resolved by this?

@stdrc
Copy link
Member Author

stdrc commented Dec 8, 2023

Did you tested the perf issue is resolved by this?

Will run a test soon

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (c74817f) 68.06% compared to head (30e76b7) 68.05%.
Report is 5 commits behind head on main.

Files Patch % Lines
src/stream/src/executor/top_n/top_n_state.rs 82.35% 6 Missing ⚠️
...tream/src/executor/top_n/group_top_n_appendonly.rs 0.00% 3 Missing ⚠️
src/stream/src/executor/top_n/group_top_n.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13832      +/-   ##
==========================================
- Coverage   68.06%   68.05%   -0.02%     
==========================================
  Files        1535     1535              
  Lines      264826   264874      +48     
==========================================
+ Hits       180266   180269       +3     
- Misses      84560    84605      +45     
Flag Coverage Δ
rust 68.05% <87.95%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stdrc
Copy link
Member Author

stdrc commented Dec 11, 2023

截屏2023-12-11 13 15 40 截屏2023-12-11 13 16 13

The throughput doesn't change too much while the bloom filter false-positive rate does decrease compared to #13797.

cc @st1page @Little-Wallace

@Little-Wallace
Copy link
Contributor

The throughput doesn't change too much while the bloom filter false-positive rate does decrease compared to #13797.

Thanks to your work. Could you run a test again because #13558 have been merged and it may reduce some IO operations.

@stdrc
Copy link
Member Author

stdrc commented Dec 11, 2023

The throughput doesn't change too much while the bloom filter false-positive rate does decrease compared to #13797.

Thanks to your work. Could you run a test again because #13558 have been merged and it may reduce some IO operations.

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2649#018c597f-5684-4b84-a7db-eddfc753f995

Something was going wrong...

截屏2023-12-12 01 36 13 截屏2023-12-12 01 36 41

Signed-off-by: Richard Chien <[email protected]>
@stdrc
Copy link
Member Author

stdrc commented Dec 13, 2023

截屏2023-12-13 11 12 12

Now FPR drops to near zero.

@st1page
Copy link
Contributor

st1page commented Dec 13, 2023

The throughput doesn't change too much while the bloom filter false-positive rate does decrease compared to #13797.

Thanks to your work. Could you run a test again because #13558 have been merged and it may reduce some IO operations.

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2649#018c597f-5684-4b84-a7db-eddfc753f995

Something was going wrong...

截屏2023-12-12 01 36 13 截屏2023-12-12 01 36 41

The barrier's piling up is because at the beginning, the source emits insert operations, and after 5 minutes passed, the temporal filter began to emit delete operations. Then the performance degraded, source executors were backpressured. But the temporal filter can not be back pressured so it still emits the delete message.
Is the #13271 included in your branch? If no, we can merge main and take a look.

@stdrc
Copy link
Member Author

stdrc commented Dec 13, 2023

Is the #13271 included in your branch? If no, we can merge main and take a look.

It is included🥵

@Little-Wallace
Copy link
Contributor

Although the throughput does not increase much, this PR still reduce IOPS of hummock and FPR:
image

@st1page
Copy link
Contributor

st1page commented Dec 13, 2023

The low performance might because of #13968

Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the PR first because it actually solve a part of performance issue.

@stdrc stdrc enabled auto-merge December 13, 2023 08:34
@stdrc stdrc added this pull request to the merge queue Dec 13, 2023
Merged via the queue into main with commit 8db6191 Dec 13, 2023
26 of 27 checks passed
@stdrc stdrc deleted the rc/fix-group-topn-too-many-table-scan branch December 13, 2023 09:22
@fuyufjh
Copy link
Member

fuyufjh commented Dec 14, 2023

Hey @st1page @stdrc Thanks for the work! Which query are you testing with? Wondering how much it solves the performance issue.

@stdrc
Copy link
Member Author

stdrc commented Dec 15, 2023

Which query are you testing with?

nexmark q9-temporal-filter, the main contribution of this PR was to reduce table iter ops

@fuyufjh
Copy link
Member

fuyufjh commented Dec 15, 2023

Which query are you testing with?

nexmark q9-temporal-filter, the main contribution of this PR was to reduce table iter ops

Can you share the effect? like how much table iter ops was reduced after this PR

@stdrc
Copy link
Member Author

stdrc commented Dec 15, 2023

Which query are you testing with?

nexmark q9-temporal-filter, the main contribution of this PR was to reduce table iter ops

Can you share the effect? like how much table iter ops was reduced after this PR

As originally reported in #13797, the bloom filter false-positive rate is high (up to 50% in most time), which means near half of table iter operations are wasted.

After this PR, the FPR is near 0% in most time, around 20% when warming up (many groups are new, so that manually maintained in-executor table_row_counts don't help). So we think that this PR did solve an aspect of the issue.

However, throughput was still not improved as expected. As @st1page found, another problem (possibly the true IO bottleneck) is that GroupTopN doesn't fetch missing groups concurrently (#13968), which seems likely to prevent from storage layer batching.

wenym1 pushed a commit that referenced this pull request Dec 25, 2023
wenym1 pushed a commit that referenced this pull request Dec 25, 2023
wenym1 added a commit that referenced this pull request Dec 25, 2023
Signed-off-by: Richard Chien <[email protected]>
Co-authored-by: Richard Chien <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf(streaming): optimzie topN cache
6 participants